Teraspeech'2000 : a 10,000 speakers database
نویسندگان
چکیده
TeraSpeech is a bilingual database (i.e., English and French) developed in partnership with a French museum, le Musée des Sciences et de l’Industrie in Paris. A demonstration of vocal signature is the support of this data collection. Aiming at the validation of a quality plan, a scenario of the demonstration has been designed, and various protocols have been developed. The quality plan is presented as well as the solutions we found for its validation (i.e., scenario and protocols). The statistics of TeraSpeech are given. Three trends are examined for the perspectives : the validation, the exploitation and the research. Over a single year of the vocal signature exhibition, TeraSpeech’2000 is a collection of more than 30,000 sentences recorded from more than 10,000 visitors. The exposition on acoustics of the museum is planned for ten years. TeraSpeech is expected to be a collection of more than 100,000 speakers recorded over the same sound acquisition channel.
منابع مشابه
How to Teach The Suprasegmentals of English To Farsi-speakers Learning English as a Foreign Language
متن کامل
Development of the estonian speechdat-like database
A new database project has been launched in Estonia last year. It aims the collection of telephone speech from a large number of speakers for speech and speaker recognition purposes. Up to 2000 speakers are expected to participate in recordings. SpeechDat databases, especially Finnish SpeechDat, have been chosen as a prototype for the Estonian database. It means that principles of corpus design...
متن کاملIssues in Design and Collection of Large Telephone Speech Corpus for Slovenian Language
In this paper, different issues in design, collection and evaluation of the large vocabulary telephone speech corpus of Slovenian language are discussed. The database is composed of three text corpora containing 1530 different sentences. It contains read speech of 82 speakers where each speaker read in average more than 200 sentences and 21 speakers read also the text passage of 90 sentences. T...
متن کاملAn automatic interpretation system for travel conversation
We have developed an automatic interpretation system running on a mobile PC that helps oral communication between Japanese and English speakers in various situations during their travel abroad. In order to allow a wide range of expressions and topics in the applied domain, we adopted an approach which utilizes the general linguistic knowledge as well as the domainspecific linguistic knowledge. ...
متن کاملPOLYCOST: A telephone-speech database for speaker recognition
This article presents an overview of the POLYCOST database dedicated to speaker recognition applications over the telephone network. The main characteristics of this database are: large mixed speech corpus size (> 100 speakers), English spoken by foreigners, mainly digits with some free speech, collected through international telephone lines, and more than eight sessions per speaker.
متن کامل